Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes to BLAST functionality and refactoring #39

Merged
merged 17 commits into from
May 7, 2024

Conversation

wm75
Copy link
Contributor

@wm75 wm75 commented May 1, 2024

CHANGELOG

Changes around BLAST functionality:

  • Off-target amplicons are now reported as intended in the logs
  • Off-target amplicons are now always considered last for the final
    scheme (no penalty is used, but the fact that they had BLAST matches
    gets recorded)
  • The BLAST_PENALTY config option is no longer needed and has been removed
  • Added a new off_target_amplicons column to the qPCR design, the qPCR primer and the (single/tiled) primer tsv outputs to indicate which final amplicons had BLAST hits

General reporting changes:

  • Amplicon numbering now proceeds from 5' to 3' even across pools for the tiled mode and from lowest penalty to highest for the other modes (previously additional BLAST penalties weren't considered during penalty-sorting.
  • In the primer bed file output in tiled mode, primers are now ordered according to the amplicon number without taking the pool into account
  • In the primer bed file output in qPCR mode, oligos from the same set are now ordered LEFT, PROBE, RIGHT, i.e. by position on the reference
  • The per-base mismatch plot now uses final primer names as panel titles
  • The amplicon bed file, in all modes, is now formatted as proper six-column bed

Algorithmic fixes and enhancements:

  • The internal representation of amplicon schemes has been unified/simplified across the different modes and steps of the analysis
  • Search for final non-overlapping amplicons in single mode and for non-overlapping amplicons passing deltaG in qpcr mode has been optimized and is now significantly faster
  • Some off-by-one errors in internal primer and amplicon interval calculations have been fixed

wm75 added 3 commits April 30, 2024 12:06
Issues addressed here:
- Off-target amplicons are now reported as intended in the logs
- Off-target amplicons are now always considered last for the final
  scheme (no penalty is used, but the fact that they had BLAST matches
gets recorded)
- Amplicon numbering proceeds from 5' to 3' even across pools
- In the primer bed file, primers are now ordered according to the
  amplicon number without taking the pool into account

For this, the internal representation of amplicon schemes had to be
unified across the different modes and steps of the analysis.
This ensures that the best amplicons get the lowest ID numbers in the
written scheme.
@wm75
Copy link
Contributor Author

wm75 commented May 1, 2024

Once I'm finished with this, you could consider removing the blast penalty parameter.

@jonas-fuchs
Copy link
Owner

Once I'm finished with this, you could consider removing the blast penalty parameter.

Wow ok, that looks great at first glance! I think your general idea for the structure is very practical. One thing, we should prob. calrify for the blast_penalty. I am not quite sure how we will be able to remove this. While you can use simple sorting in the qpcr and single mode without needing to add to the general penalty, I am not sure how this can be achieved with the graph search. The idea of this high penalty was that the amplicons are preferentially used if they do not have recieved the high blast penalty but can be used if no other amplicons are available - cost for walking are in this case high but better than not finding a scheme.

@jonas-fuchs jonas-fuchs added the enhancement New feature or request label May 1, 2024
@wm75
Copy link
Contributor Author

wm75 commented May 2, 2024

I am not sure how this can be achieved with the graph search

A very fair point. Was exploring this a bit and the latest commit now uses graph edge values / distances that are tuples of (number_of_off_target_amplicons, costs).

Turns out this is somewhat more efficient in avoiding amplicons with BLAST hits:
previously an off_target amplicon would have been accepted if the costs of the path avoiding it were higher than the BLAST penalty. Now, with coverage being equal, the path with the least number of off_target amplicons is guaranteed to be chosen and will use off_target amplicons only as a last resort.

@wm75 wm75 changed the title [WIP] Fixes to BLAST functionality and refactoring Fixes to BLAST functionality and refactoring May 3, 2024
@jonas-fuchs jonas-fuchs merged commit 2a525d6 into jonas-fuchs:master May 7, 2024
1 check passed
@wm75 wm75 deleted the unify-modes branch May 7, 2024 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants